- 에임스(Ames) 도시의 허위매물 탐지
- 위치: 미국 아이오와주 중부, 디모인 북쪽 약 50km
- 인구: 약 66,000명 (2020년 기준)
- 특징:
- 아이오와 주립대학교 소재 (학생 수 약 30,000명, 전체 인구의 약 45%)
- 안정된 주거 환경과 활발한 임대 시장 (임대 가구 비율 약 55%)
- 젊은 인구 비중 높고 교육 중심 도시 (20~34세 인구 비중 약 40%)
- Ames 지역의 평균 주택가격 기준
- 상·중·하로 분류
- 각 점은 개별 주택을 의미
::: ### 지역별 주택 가격 구간 분류
=======| 가격 구간 | 평균 가격 범위 (USD) | 주요 지역 |
|---|---|---|
| 🔺 고가 지역 (High) | ≥ 217,676 | NoRidge, NridgHt, StoneBr, Veenker, Greens, Timber |
| ⚖️ 중간 지역 (Mid) | 136,144 ~ 217,675 | CollgCr, Mitchel, NAmes, SawyerW, OldTown, Crawfor, Edwards |
| 🔻 저가 지역 (Low) | ≤ 136,143 | MeadowV, BrDale, IDOTRR, Landmrk, Blueste |
Neighborhood별 평균SalePrice를 기준으로 분위수(Quantile)를 계산하여 고가/중간/저가로 분류함
- 상위 25% 이상: 고가 지역, 중위 50%: 중간 지역, 하위 25% 이하: 저가 지역
🔺 고가 지역 (High)
- 평균 주택 가격이 높고 고급 주택 밀집
- 최신 건축/리모델링, 품질 우수
- 넓은 면적과 부대시설 완비
- 고급 단독주택 중심, 조용한 환경
- 거래량은 적지만 희소성 존재
⚖️ 중간 지역 (Mid)
- Ames 평균 수준 주택 분포
- 다양한 주거 형태(단독, 타운하우스 등)
- 젊은층/대학생 임대 수요 존재
- 인프라 양호, 가족 선호 지역
- 거래량 많고 시장 내 활발
🔻 저가 지역 (Low)
- 평균 주택 가격 낮고 일부 노후
- 유지 관리 상태 중하 수준
- 소형 임대 주택 비중 높음
- 소음, 상업지 인접 등으로 선호도 낮음
- 거래량 적고 정보 부족
📌 분석 과정
6가지 조건을 바탕으로 점수를 부여하고, 3점 이상에 해당되는 허위매물을 추출한다.
이후 회귀 모델을 통해 허위매물을 추출한 뒤, 공통 허위매물을 추출한다.
| Neighborhood | Price_Level | |
|---|---|---|
| 0 | NridgHt | High |
| 1 | Timber | High |
| 2 | Somerst | High |
| 3 | NoRidge | High |
| 4 | GrnHill | High |
| 5 | StoneBr | High |
| 6 | Veenker | High |
| 7 | NWAmes | Mid |
| 8 | Blmngtn | Mid |
| 9 | Mitchel | Mid |
| 10 | NAmes | Mid |
| 11 | CollgCr | Mid |
| 12 | SawyerW | Mid |
| 13 | Gilbert | Mid |
| 14 | Sawyer | Mid |
| 15 | Crawfor | Mid |
| 16 | Greens | Mid |
| 17 | ClearCr | Mid |
| 18 | NPkVill | Mid |
| 19 | Blueste | Mid |
| 20 | Landmrk | Mid |
| 21 | SWISU | Low |
| 22 | Edwards | Low |
| 23 | IDOTRR | Low |
| 24 | OldTown | Low |
| 25 | MeadowV | Low |
| 26 | BrDale | Low |
| 27 | BrkSide | Low |
✔GrLivArea : 지상층 면적
✔YearRemodAdd : 리모델링
✔RoomDensity : 방 밀도 (방수/면적)
✔OverallCond : 리모델링
✔Amenities : 방 밀도 (방수/면적)
조건 플래그:
- flag_high_qual: 3
- flag_good_condition: 22
- flag_high_area: 10
- flag_high_remod: 76
- flag_high_density: 93
- flag_high_amenities: 3
Score 분포:
- 0: 76
- 1: 103
- 2: 47
- 3: 2
- 4: 1
Score ≥ 3인 건수: 3건
조건 플래그:
- flag_mid_qual: 2
- flag_good_condition: 366
- flag_mid_area: 53
- flag_mid_remod: 107
- flag_mid_density: 296
- flag_mid_amenities: 37
Score 분포:
- 0: 158
- 1: 348
- 2: 172
- 3: 51
- 4: 4
Score ≥ 3인 건수: 55건
조건 플래그:
- flag_low_qual: 4
- flag_good_condition: 30
- flag_low_area: 24
- flag_low_remod: 59
- flag_low_density: 131
- flag_low_amenities: 10
Score 분포:
- 0: 133
- 1: 155
- 2: 30
- 3: 13
- 4: 1
Score ≥ 3인 건수: 14건
❗ 조건 플래그 결과
72건의 허위매물 의심 후보 추출
우측 지도를 통해 그룹별 허위매물 분포 확인 가능함
| Neighborhood | price_level |
|---|---|
| OldTown | Low |
| Mitchel | Mid |
| Mitchel | Mid |
| MeadowV | Low |
| OldTown | Low |
| NAmes | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| NAmes | Mid |
| NoRidge | High |
| CollgCr | Mid |
| CollgCr | Mid |
| Mitchel | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| Somerst | High |
| NAmes | Mid |
| Crawfor | Mid |
| NAmes | Mid |
| NAmes | Mid |
| MeadowV | Low |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| Crawfor | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| NAmes | Mid |
| BrkSide | Low |
| NAmes | Mid |
| Timber | High |
| NAmes | Mid |
| Crawfor | Mid |
| OldTown | Low |
| NAmes | Mid |
| SawyerW | Mid |
| Sawyer | Mid |
| BrkSide | Low |
| NAmes | Mid |
| OldTown | Low |
| NAmes | Mid |
| Edwards | Low |
| Timber | High |
| SawyerW | Mid |
| NAmes | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| Sawyer | Mid |
| NAmes | Mid |
| NAmes | Mid |
| Mitchel | Mid |
| NAmes | Mid |
| Mitchel | Mid |
| Veenker | High |
| Edwards | Low |
| CollgCr | Mid |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| CollgCr | Mid |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| SawyerW | Mid |
| OldTown | Low |
| BrkSide | Low |
| NAmes | Mid |
| Veenker | High |
| Veenker | High |
| Sawyer | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| SawyerW | Mid |
| NWAmes | Mid |
| Sawyer | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| NAmes | Mid |
| CollgCr | Mid |
| NAmes | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| Crawfor | Mid |
| Blueste | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| Sawyer | Mid |
| Sawyer | Mid |
| Veenker | High |
| Veenker | High |
| NAmes | Mid |
| Mitchel | Mid |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| Mitchel | Mid |
| Sawyer | Mid |
| NAmes | Mid |
| ClearCr | Mid |
| Mitchel | Mid |
| CollgCr | Mid |
| Gilbert | Mid |
| Mitchel | Mid |
| NAmes | Mid |
| Somerst | High |
| Greens | Mid |
| Edwards | Low |
| SawyerW | Mid |
| CollgCr | Mid |
| Sawyer | Mid |
| NAmes | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| NPkVill | Mid |
| SawyerW | Mid |
| Crawfor | Mid |
| NAmes | Mid |
| CollgCr | Mid |
| NAmes | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| Sawyer | Mid |
| NWAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| Blueste | Mid |
| Sawyer | Mid |
| Mitchel | Mid |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| NWAmes | Mid |
| NAmes | Mid |
| OldTown | Low |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| Sawyer | Mid |
| Mitchel | Mid |
| NAmes | Mid |
| OldTown | Low |
| Sawyer | Mid |
| Blueste | Mid |
| IDOTRR | Low |
| OldTown | Low |
| NAmes | Mid |
| NAmes | Mid |
| Mitchel | Mid |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| NAmes | Mid |
| NWAmes | Mid |
| CollgCr | Mid |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| CollgCr | Mid |
| NoRidge | High |
| Sawyer | Mid |
| NAmes | Mid |
| NAmes | Mid |
| CollgCr | Mid |
| NAmes | Mid |
| OldTown | Low |
| CollgCr | Mid |
| CollgCr | Mid |
| NAmes | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| Mitchel | Mid |
| NAmes | Mid |
| NWAmes | Mid |
| Timber | High |
| Timber | High |
| NPkVill | Mid |
| NAmes | Mid |
| CollgCr | Mid |
| Sawyer | Mid |
| OldTown | Low |
| NAmes | Mid |
| CollgCr | Mid |
| CollgCr | Mid |
| NAmes | Mid |
| NAmes | Mid |
| Blueste | Mid |
| NAmes | Mid |
| NAmes | Mid |
| Edwards | Low |
| NAmes | Mid |
| NWAmes | Mid |
| NAmes | Mid |
| NWAmes | Mid |
| SawyerW | Mid |
| CollgCr | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| Timber | High |
| NAmes | Mid |
| CollgCr | Mid |
| Sawyer | Mid |
| Crawfor | Mid |
| CollgCr | Mid |
| OldTown | Low |
| NAmes | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| NAmes | Mid |
| Blueste | Mid |
| NPkVill | Mid |
| Crawfor | Mid |
| NAmes | Mid |
| NAmes | Mid |
| Mitchel | Mid |
| NAmes | Mid |
| Somerst | High |
| CollgCr | Mid |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| Crawfor | Mid |
| Blueste | Mid |
| Sawyer | Mid |
| Sawyer | Mid |
| NAmes | Mid |
| Veenker | High |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| CollgCr | Mid |
| Sawyer | Mid |
| NWAmes | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| CollgCr | Mid |
| NAmes | Mid |
| CollgCr | Mid |
| NAmes | Mid |
| Mitchel | Mid |
| NPkVill | Mid |
| Mitchel | Mid |
| NAmes | Mid |
| Somerst | High |
| NAmes | Mid |
| Mitchel | Mid |
| SawyerW | Mid |
| Crawfor | Mid |
| SawyerW | Mid |
| NAmes | Mid |
| CollgCr | Mid |
| Sawyer | Mid |
| OldTown | Low |
| NAmes | Mid |
| Veenker | High |
| Edwards | Low |
| Mitchel | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| Sawyer | Mid |
| NAmes | Mid |
| Timber | High |
| Sawyer | Mid |
| Mitchel | Mid |
| NAmes | Mid |
| NWAmes | Mid |
| Crawfor | Mid |
| Sawyer | Mid |
| NAmes | Mid |
| Mitchel | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| OldTown | Low |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| NPkVill | Mid |
| NAmes | Mid |
| OldTown | Low |
| Edwards | Low |
| CollgCr | Mid |
| Blueste | Mid |
| NAmes | Mid |
| SawyerW | Mid |
| Sawyer | Mid |
| NAmes | Mid |
| NPkVill | Mid |
| NAmes | Mid |
| NAmes | Mid |
| CollgCr | Mid |
| BrkSide | Low |
| NAmes | Mid |
| Sawyer | Mid |
| NPkVill | Mid |
| NAmes | Mid |
| Mitchel | Mid |
| NAmes | Mid |
| Mitchel | Mid |
| NPkVill | Mid |
| NAmes | Mid |
| NWAmes | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| NAmes | Mid |
| OldTown | Low |
| NWAmes | Mid |
| NAmes | Mid |
| OldTown | Low |
| IDOTRR | Low |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| Mitchel | Mid |
| Mitchel | Mid |
| NAmes | Mid |
| NAmes | Mid |
| Mitchel | Mid |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| Veenker | High |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| Mitchel | Mid |
| NAmes | Mid |
| NWAmes | Mid |
| Blueste | Mid |
| NWAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| Crawfor | Mid |
| Mitchel | Mid |
| Timber | High |
| Edwards | Low |
| Sawyer | Mid |
| Gilbert | Mid |
| Sawyer | Mid |
| NWAmes | Mid |
| Crawfor | Mid |
| Sawyer | Mid |
| Crawfor | Mid |
| Timber | High |
| NAmes | Mid |
| Crawfor | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| CollgCr | Mid |
| CollgCr | Mid |
| NAmes | Mid |
| CollgCr | Mid |
| NPkVill | Mid |
| Sawyer | Mid |
| NridgHt | High |
| NAmes | Mid |
| Mitchel | Mid |
| NAmes | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| CollgCr | Mid |
| Mitchel | Mid |
| Crawfor | Mid |
| Veenker | High |
| NAmes | Mid |
| Sawyer | Mid |
| NAmes | Mid |
| NWAmes | Mid |
| NAmes | Mid |
| NPkVill | Mid |
| IDOTRR | Low |
| NAmes | Mid |
| BrDale | Low |
| NAmes | Mid |
| CollgCr | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| Mitchel | Mid |
| Mitchel | Mid |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| OldTown | Low |
| NAmes | Mid |
| NAmes | Mid |
| Crawfor | Mid |
| StoneBr | High |
| Mitchel | Mid |
| Sawyer | Mid |
| NAmes | Mid |
| Sawyer | Mid |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| Mitchel | Mid |
| OldTown | Low |
| CollgCr | Mid |
| NAmes | Mid |
| Edwards | Low |
| Crawfor | Mid |
| NAmes | Mid |
| BrkSide | Low |
| NAmes | Mid |
| Timber | High |
| Sawyer | Mid |
| NPkVill | Mid |
| Edwards | Low |
| IDOTRR | Low |
| Sawyer | Mid |
| Sawyer | Mid |
| NWAmes | Mid |
| CollgCr | Mid |
| OldTown | Low |
| NAmes | Mid |
| OldTown | Low |
| NAmes | Mid |
| NAmes | Mid |
| NAmes | Mid |
| Crawfor | Mid |
📌 회귀 분석
❗ 회귀분석 모델을 적용하여 허위매물을 찾아낼 경우, 점수제로 추려진 허위매물과 무엇이 같고, 무엇이 다른지 비교가능
✔ 종속변수: ‘SalePrice’
✔ 독립변수: ‘OverallQual’, ‘OverallCond’, ‘GrLivArea’, ‘YearRemodAdd’, ‘RoomDensity’, ‘amenities’
🔍 회귀분석 진행과정
1. 모든 변수의 영향을 유지하기 위해 Ridge 회귀 적용
2. 데이터를 학습용 80%, 테스트용 20%로 분리하고, 5-fold 교차 검증 수행
3. 교차 검증을 통해 모델의 안정성을 확보하고, 다양한 정규화 강도(α)에서 테스트하여 최적의 예측 성능을 가진 모델을 선택함.
4. 성능 평가 지표로 ‘neg_mean_squared_error’(음의 평균 제곱 오차)를 사용
5. Python의 scikit-learn에서는 점수가 높을수록 좋은 모델로 평가하는 규칙이 있어 오차 지표를 음수화하여 사용
6. 허위매물 판별을 위해 실제가격과 예측가격의 차이(잔차)를 계산하고, 하위 2.8%(72/2579)를 허위매물로 분류
7. 점수제에서 발견한 허위매물 수와 동일한 비율을 적용하여 두 방법론의 결과를 직접 비교할 수 있게 함
8. 각 가격 수준(Low, Mid, High) 그룹별로 별도의 모델을 구축하여 가격대별 특성을 반영한 허위매물 탐지가 가능하도록 함.
회귀분석 모델을 적용하여 허위매물을 찾아낼 경우, 점수제로 추려진 허위매물과 무엇이 같고, 무엇이 다른지 비교가능
- 종속변수: ‘SalePrice’
독립변수: ‘OverallQual’, ‘OverallCond’, ‘GrLivArea’, ‘YearRemodAdd’, ‘RoomDensity’, ‘amenities’
점수제에서 사용했던 6가지 조건에서 독립변수를 가져옴. 이를 통해 점수제 방식과 비교가 가능함.
- 모든 변수의 영향을 유지하기 위해 Ridge 회귀 적용.
또한 데이터를 학습용 80%, 테스트용 20%로 분리하고, 5-fold 교차 검증 수행.
교차 검증을 통해 모델의 안정성을 확보하고, 다양한 정규화 강도(α)에서 테스트하여 최적의 예측 성능을 가진 모델을 선택함.
성능 평가 지표로 ‘neg_mean_squared_error’(음의 평균 제곱 오차)를 사용.
Python의 scikit-learn에서는 점수가 높을수록 좋은 모델로 평가하는 규칙이 있어 오차 지표를 음수화하여 사용.
- 허위매물 판별을 위해 실제가격과 예측가격의 차이(잔차)를 계산하고, 하위 2.8%(72/2579)를 허위매물로 분류.
이는 점수제에서 발견한 허위매물 수와 동일한 비율을 적용하여 두 방법론의 결과를 직접 비교할 수 있게 함.
- 각 가격 수준(Low, Mid, High) 그룹별로 별도의 모델을 구축하여 가격대별 특성을 반영한 허위매물 탐지가 가능하도록 함.
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
2.78255940e+00, 1.00000000e+01]),
cv=5, scoring='neg_mean_squared_error')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
=======
#sk-container-id-19 a.estimator_doc_link.fitted:hover {
/* fitted */
background-color: var(--sklearn-color-fitted-level-3);
}
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
2.78255940e+00, 1.00000000e+01]),
cv=5, scoring='neg_mean_squared_error')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
>>>>>>> 46f553887d1b53b5727b2beebc043eb7ba4323d9
1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
2.78255940e+00, 1.00000000e+01]),
cv=5, scoring='neg_mean_squared_error')
✔ 설명력 (R²): 0.633 ✔ 최적 α (alpha): 10
✔ 전체 샘플 수 : 662개 ✔ 허위 매물 수 : 19개
▶ 전체 샘플 수: 662개
▶ 허위매물 수: 19개 (2.9%)
| Neighborhood | SalePrice | predicted | residual | |
|---|---|---|---|---|
| 309 | Edwards | 184750 | 341478.416904 | -156728.416904 |
| 2204 | OldTown | 90000 | 165060.186265 | -75060.186265 |
| 469 | OldTown | 122000 | 196637.789395 | -74637.789395 |
| 1909 | OldTown | 97500 | 165711.760154 | -68211.760154 |
| 740 | IDOTRR | 40000 | 100640.665189 | -60640.665189 |
| 116 | OldTown | 159500 | 219067.388226 | -59567.388226 |
| 1214 | OldTown | 107500 | 165796.538413 | -58296.538413 |
| 254 | OldTown | 133900 | 187704.330474 | -53804.330474 |
| 1436 | OldTown | 106000 | 158650.035771 | -52650.035771 |
| 677 | OldTown | 103500 | 155031.742968 | -51531.742968 |
| 374 | BrkSide | 106900 | 158205.245880 | -51305.245880 |
| 1225 | OldTown | 117000 | 165247.245319 | -48247.245319 |
| 2277 | IDOTRR | 123000 | 171019.817386 | -48019.817386 |
| 2025 | OldTown | 117500 | 163763.677821 | -46263.677821 |
| 205 | IDOTRR | 50000 | 95575.542686 | -45575.542686 |
| 528 | IDOTRR | 89500 | 130977.645762 | -41477.645762 |
| 1064 | OldTown | 64500 | 105347.680388 | -40847.680388 |
| 427 | OldTown | 12789 | 53436.186714 | -40647.186714 |
| 22 | MeadowV | 98000 | 137418.527580 | -39418.527580 |
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
2.78255940e+00, 1.00000000e+01]),
cv=5, scoring='neg_mean_squared_error')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
=======
#sk-container-id-20 a.estimator_doc_link.fitted:hover {
/* fitted */
background-color: var(--sklearn-color-fitted-level-3);
}
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
2.78255940e+00, 1.00000000e+01]),
cv=5, scoring='neg_mean_squared_error')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
>>>>>>> 46f553887d1b53b5727b2beebc043eb7ba4323d9
1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
2.78255940e+00, 1.00000000e+01]),
cv=5, scoring='neg_mean_squared_error')
설명력 (R²): 0.746
최적 α (alpha): 0.0001
▶ 전체 샘플 수: 1464개
▶ 허위매물 수: 41개 (2.8%)
| Neighborhood | SalePrice | predicted | residual | |
|---|---|---|---|---|
| 180 | NWAmes | 82500 | 186512.846249 | -104012.846249 |
| 997 | NAmes | 84900 | 164088.769037 | -79188.769037 |
| 1262 | Sawyer | 112000 | 189448.582663 | -77448.582663 |
| 1703 | Gilbert | 164000 | 237690.994626 | -73690.994626 |
| 607 | SawyerW | 131000 | 203319.538648 | -72319.538648 |
| 232 | NAmes | 97500 | 165959.754915 | -68459.754915 |
| 748 | NWAmes | 154000 | 222452.575310 | -68452.575310 |
| 777 | NAmes | 140000 | 207071.732788 | -67071.732788 |
| 2207 | Sawyer | 158000 | 222634.022503 | -64634.022503 |
| 1735 | NAmes | 180000 | 239892.023341 | -59892.023341 |
| 1777 | Sawyer | 130500 | 189388.251889 | -58888.251889 |
| 328 | NAmes | 100000 | 157610.279322 | -57610.279322 |
| 1592 | NAmes | 152500 | 209835.479856 | -57335.479856 |
| 1973 | NAmes | 110000 | 167110.808559 | -57110.808559 |
| 2399 | Crawfor | 135000 | 191693.656661 | -56693.656661 |
| 2478 | Crawfor | 149000 | 205373.529713 | -56373.529713 |
| 1392 | Mitchel | 115000 | 170705.874780 | -55705.874780 |
| 379 | NAmes | 104900 | 160448.835040 | -55548.835040 |
| 2165 | Crawfor | 137000 | 192003.206946 | -55003.206946 |
| 1085 | NAmes | 132000 | 184829.510696 | -52829.510696 |
| 1790 | NAmes | 139000 | 191731.983243 | -52731.983243 |
| 445 | Sawyer | 112000 | 164354.269125 | -52354.269125 |
| 289 | NAmes | 167000 | 219307.354724 | -52307.354724 |
| 2293 | ClearCr | 148400 | 200423.025348 | -52023.025348 |
| 79 | SawyerW | 67500 | 119207.948646 | -51707.948646 |
| 2044 | NAmes | 133000 | 184566.013933 | -51566.013933 |
| 1259 | NWAmes | 170000 | 219375.681066 | -49375.681066 |
| 1533 | Sawyer | 62383 | 111275.071760 | -48892.071760 |
| 1955 | Crawfor | 191000 | 239871.517587 | -48871.517587 |
| 1557 | SawyerW | 138500 | 186704.367535 | -48204.367535 |
| 478 | Sawyer | 119500 | 166866.155117 | -47366.155117 |
| 2113 | Crawfor | 145000 | 190775.062653 | -45775.062653 |
| 657 | NPkVill | 123000 | 168657.234940 | -45657.234940 |
| 2085 | Gilbert | 115000 | 159734.356699 | -44734.356699 |
| 112 | NWAmes | 185000 | 229329.132298 | -44329.132298 |
| 1276 | NAmes | 143000 | 187007.097643 | -44007.097643 |
| 2397 | NAmes | 242000 | 285697.477460 | -43697.477460 |
| 661 | NAmes | 135000 | 178659.552930 | -43659.552930 |
| 585 | Mitchel | 160000 | 203299.251232 | -43299.251232 |
| 793 | Sawyer | 121500 | 164707.980590 | -43207.980590 |
| 1752 | Blueste | 121000 | 164024.916824 | -43024.916824 |
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
2.78255940e+00, 1.00000000e+01]),
cv=5, scoring='neg_mean_squared_error')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
=======
#sk-container-id-21 a.estimator_doc_link.fitted:hover {
/* fitted */
background-color: var(--sklearn-color-fitted-level-3);
}
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
2.78255940e+00, 1.00000000e+01]),
cv=5, scoring='neg_mean_squared_error')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
>>>>>>> 46f553887d1b53b5727b2beebc043eb7ba4323d9
1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
2.78255940e+00, 1.00000000e+01]),
cv=5, scoring='neg_mean_squared_error')
설명력 (R²): 0.730
최적 α (alpha): 0.0001
▶ 전체 샘플 수: 453개
▶ 허위매물 수: 13개 (2.9%)
| Neighborhood | SalePrice | predicted | residual | |
|---|---|---|---|---|
| 275 | Veenker | 150000 | 377583.040255 | -227583.040255 |
| 1008 | Timber | 204000 | 331268.591361 | -127268.591361 |
| 1686 | NoRidge | 285000 | 383469.594440 | -98469.594440 |
| 111 | Somerst | 172500 | 267553.768343 | -95053.768343 |
| 1310 | StoneBr | 270000 | 357997.110005 | -87997.110005 |
| 300 | Somerst | 280750 | 363476.267632 | -82726.267632 |
| 1398 | Timber | 202900 | 281247.127845 | -78347.127845 |
| 1278 | Somerst | 170000 | 248322.760527 | -78322.760527 |
| 1495 | NoRidge | 248000 | 324732.678440 | -76732.678440 |
| 949 | NoRidge | 290000 | 364891.692276 | -74891.692276 |
| 51 | Somerst | 193800 | 268322.417291 | -74522.417291 |
| 1411 | StoneBr | 130000 | 204137.130175 | -74137.130175 |
| 2134 | Somerst | 345000 | 416042.158581 | -71042.158581 |
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
2.78255940e+00, 1.00000000e+01]),
cv=5, scoring='neg_mean_squared_error')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
2.78255940e+00, 1.00000000e+01]),
cv=5, scoring='neg_mean_squared_error')RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
2.78255940e+00, 1.00000000e+01]),
cv=5, scoring='neg_mean_squared_error')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
2.78255940e+00, 1.00000000e+01]),
cv=5, scoring='neg_mean_squared_error')RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
2.78255940e+00, 1.00000000e+01]),
cv=5, scoring='neg_mean_squared_error')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
2.78255940e+00, 1.00000000e+01]),
cv=5, scoring='neg_mean_squared_error')RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
2.78255940e+00, 1.00000000e+01]),
cv=5, scoring='neg_mean_squared_error')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
2.78255940e+00, 1.00000000e+01]),
cv=5, scoring='neg_mean_squared_error')RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
2.78255940e+00, 1.00000000e+01]),
cv=5, scoring='neg_mean_squared_error')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
2.78255940e+00, 1.00000000e+01]),
cv=5, scoring='neg_mean_squared_error')RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
2.78255940e+00, 1.00000000e+01]),
cv=5, scoring='neg_mean_squared_error')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
2.78255940e+00, 1.00000000e+01]),
cv=5, scoring='neg_mean_squared_error')점수제 허위매물
72개
회귀분석 허위매물
73개
최종 선정 허위매물
8개
| Neighborhood | PID | SalePrice | score | OverallQual | OverallCond | GrLivArea | YearRemodAdd | RoomDensity | amenities | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1752 | Blueste | 909451140 | 121000 | 3 | 6 | 6 | 1229 | 1980 | 0.005696 | 2 |
| 374 | BrkSide | 903225160 | 106900 | 4 | 6 | 9 | 1290 | 2000 | 0.006202 | 3 |
| 2165 | Crawfor | 909254010 | 137000 | 3 | 7 | 8 | 1228 | 1990 | 0.005700 | 2 |
| 2399 | Crawfor | 909254100 | 135000 | 3 | 6 | 8 | 1461 | 1991 | 0.004791 | 2 |
| 2478 | Crawfor | 909275050 | 149000 | 3 | 7 | 6 | 1502 | 2000 | 0.005326 | 2 |
| 2113 | Crawfor | 909275020 | 145000 | 3 | 6 | 6 | 1958 | 1950 | 0.005107 | 2 |
| 2477 | Edwards | 909101010 | 110000 | 4 | 6 | 8 | 1196 | 2000 | 0.006689 | 3 |
| 2085 | Gilbert | 527226020 | 115000 | 3 | 6 | 2 | 1474 | 1952 | 0.005427 | 3 |
| 585 | Mitchel | 923400040 | 160000 | 4 | 6 | 7 | 1750 | 1985 | 0.005143 | 3 |
| 997 | NAmes | 534427010 | 84900 | 3 | 5 | 6 | 1728 | 2001 | 0.006944 | 1 |
| 2044 | NAmes | 534479130 | 133000 | 3 | 6 | 7 | 1578 | 1950 | 0.003802 | 2 |
| 777 | NAmes | 534477110 | 140000 | 3 | 6 | 8 | 1668 | 2005 | 0.004796 | 2 |
| 1085 | NAmes | 535450210 | 132000 | 3 | 6 | 8 | 1224 | 2004 | 0.004902 | 2 |
| 1592 | NAmes | 535353240 | 152500 | 3 | 7 | 7 | 1527 | 1999 | 0.005239 | 2 |
| 1276 | NAmes | 535450310 | 143000 | 3 | 6 | 6 | 1846 | 1950 | 0.004875 | 2 |
| 1790 | NAmes | 535175030 | 139000 | 3 | 6 | 6 | 1632 | 1988 | 0.004902 | 2 |
| 748 | NWAmes | 527352150 | 154000 | 3 | 7 | 6 | 2050 | 1978 | 0.005366 | 2 |
| 1225 | OldTown | 903430090 | 117000 | 3 | 6 | 8 | 1635 | 2003 | 0.003670 | 2 |
| 1909 | OldTown | 903476090 | 97500 | 3 | 7 | 5 | 1864 | 2000 | 0.006438 | 1 |
| 2207 | Sawyer | 905225020 | 158000 | 3 | 5 | 4 | 2654 | 1996 | 0.005275 | 3 |
| 478 | Sawyer | 532351150 | 119500 | 3 | 6 | 6 | 1654 | 1977 | 0.007255 | 1 |
| 793 | Sawyer | 905103060 | 121500 | 3 | 5 | 6 | 1392 | 1996 | 0.005029 | 2 |
| 1777 | Sawyer | 533352170 | 130500 | 3 | 6 | 8 | 1479 | 2005 | 0.006085 | 2 |
| 445 | Sawyer | 905226050 | 112000 | 3 | 5 | 7 | 1416 | 2007 | 0.006356 | 2 |
| 607 | SawyerW | 906226060 | 131000 | 3 | 5 | 7 | 2016 | 2007 | 0.003968 | 1 |
| 1557 | SawyerW | 906425045 | 138500 | 3 | 6 | 8 | 1445 | 1993 | 0.005536 | 2 |
| 1008 | Timber | 916403200 | 204000 | 4 | 6 | 8 | 2237 | 2006 | 0.004470 | 3 |
| 275 | Veenker | 533350090 | 150000 | 3 | 9 | 3 | 2944 | 1977 | 0.004416 | 2 |
📍 서로 다른 탐지 관점을 가지고 있기 때문에 두가지 방법의 결과가 상이하다고 판단
🏠 최종 결론
점수제를 통한 허위매물 탐지는 직관적인 기준에 기반해 빠르게 의심 매물을 걸러낼 수 있으며,
회귀 분석을 통한 방법은 패턴분석을 통해 정교한 판단을 할 수 있습니다.
두 가지 방법을 보완적으로 함께 활용할 경우,
단일 방법보다 더 높은 신뢰도로 허위매물 가능성이 높은 대상을 선별할 수 있다고 판단됩니다.